Search CORE

16 research outputs found

Bringing Virtualization to the x86 Architecture with the Original VMware Workstation

Author: Bugnion Edouard
Devine Scott
Rosenblum Mendel
Sugerman Jeremy
Wang Edward Y.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 09/02/2013
Field of study

This article describes the historical context, technical challenges, and main implementation techniques used by VMware Workstation to bring virtualization to the x86 architecture in 1999. Although virtual machine monitors (VMMs) had been around for decades, they were traditionally designed as part of monolithic, single-vendor architectures with explicit support for virtualization. In contrast, the x86 architecture lacked virtualization support, and the industry around it had disaggregated into an ecosystem, with different ven- dors controlling the computers, CPUs, peripherals, operating systems, and applications, none of them asking for virtualization. We chose to build our solution independently of these vendors. As a result, VMware Workstation had to deal with new challenges associated with (i) the lack of virtual- ization support in the x86 architecture, (ii) the daunting complexity of the architecture itself, (iii) the need to support a broad combination of peripherals, and (iv) the need to offer a simple user experience within existing environments. These new challenges led us to a novel combination of well-known virtualization techniques, techniques from other domains, and new techniques. VMware Workstation combined a hosted architecture with a VMM. The hosted architecture enabled a simple user experience and offered broad hardware compatibility. Rather than exposing I/O diversity to the virtual machines, VMware Workstation also relied on software emulation of I/O devices. The VMM combined a trap-and-emulate direct execution engine with a system-level dynamic binary translator to ef- ficiently virtualize the x86 architecture and support most commodity operating systems. By relying on x86 hardware segmentation as a protection mechanism, the binary translator could execute translated code at near hardware speeds. The binary translator also relied on partial evaluation and adaptive retranslation to reduce the overall overheads of virtualization. Written with the benefit of hindsight, this article shares the key lessons we learned from building the original system and from its later evolution

Infoscience - École polytechnique fédérale de Lausanne

Virtualization performance

Author: Jennifer Anderson
Richard McDougall
Sugerman Jeremy
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Abstract Interactive k-D Tree GPU Raytracing

Author: Daniel Reiter
Horn Jeremy
Houston Pat Hanrahan
Sugerman Mike
Publication venue
Publication date
Field of study

Over the past few years, the powerful computation rates and high memory bandwidth of GPUs have attracted efforts to run raytracing on GPUs. Our work extends Foley et al.’s GPU k-d tree research. We port their kd-restart algorithm from multi-pass, using CPU load balancing, to single pass, using current GPUs ’ branching and looping abilities. We introduce three optimizations: a packetized formulation, a technique for restarting partially down the tree instead of at the root, and a small, fixed-size stack that is checked before resorting to restart. Our optimized implementation achieves 15- 18 million primary rays per second and 16- 27 million shadow rays per second on our test scenes. Our system also takes advantage of GPUs ’ strengths at rasterization and shading to offer a mode where rasterization replaces eye ray scene intersection, and primary hits and local shading are produced with standard Direct3D code. For 1024x1024 renderings of our scenes with shadows and Phong shading, we achieve 12-18 frames per second. Finally, we investigate the efficiency of our implementation relative to the computational resources of our GPUs and also compare it against conventional CPUs and the Cell processor, which both have been shown to raytrace well

CiteSeerX

GRAMPS: A programming model for graphics pipelines

Author: Jeremy Sugerman
Kayvon Fatahalian
Kurt Akeley
Pat Hanrahan
Solomon Boulos
Publication venue
Publication date: 01/01/2009
Field of study

We introduce GRAMPS, a programming model that generalizes concepts from modern real-time graphics pipelines by exposing a model of execution containing both fixed-function and application-programmable processing stages that exchange data via queues. GRAMPS allows the number, type, and connectivity of these processing stages to be defined by software, permitting arbitrary processing pipelines or even processing graphs. Applications achieve high performance using GRAMPS by expressing advanced rendering algorithms as custom pipelines, then using the pipeline as a rendering engine. We describe the design of GRAMPS, then evaluate it by implementing three pipelines, that is, Direct3D, a ray tracer, and a hybridization of the two, and running them on emulations of two different GRAMPS implementations: a traditional GPU-like architecture and a CPU-like multicore architecture. In our tests, our GRAMPS schedulers run our pipelines with 500 to 1500KB of queue usage at their peaks.

CiteSeerX

Brook for GPUs: Stream Computing on Graphics Hardware

Author: Daniel Horn
Ian Buck
Jeremy Sugerman
Kayvon Fatahalian
Mike Houston
Pat Hanrahan
Tim Foley
Publication venue
Publication date: 01/01/2004
Field of study

In this paper, we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming coprocessor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a compute engine compared to the CPU, to determine when the GPU can outperform the CPU for a particular algorithm. We evaluate our system with five applications, the SAXPY and SGEMV BLAS operators, image segmentation, FFT, and ray tracing. For these applications, we demonstrate that our Brook implementations perform comparably to hand-written GPU code and up to seven times faster than their CPU counterparts

CiteSeerX

GRAMPS

Author: Jeremy Sugerman
Kapasi U.
Kayvon Fatahalian
Kurt Akeley
Owens J. D.
Pat Hanrahan
Solomon Boulos
Thies W.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref